Skip to content

Conversation

@pingSubhajit
Copy link
Contributor

Summary

This PR introduces the eval harness battery, an optional module for systematically measuring retrieval quality in RAG pipelines. The harness allows users to define test datasets with ground truth relevance labels, run retrieval against them, and produce metrics (precision, recall, MRR) that quantify performance over time.

What's included

Core eval modules (packages/unrag/registry/eval/)

  • dataset.ts - Dataset parsing and validation for the eval dataset format
  • metrics.ts - Computation of retrieval metrics at K (precision, recall, MRR)
  • runner.ts - Orchestrates eval runs including document ingestion, query execution, and optional reranking
  • report.ts - Report generation in JSON and Markdown formats, including diff support for comparing runs

CLI integration

  • Extended unrag add command to support battery eval installation
  • Registers the eval battery in the manifest

Documentation (apps/web/content/docs/eval/)

  • Overview of the eval harness and when to use it
  • Getting started guide for installation and first eval run
  • Dataset format specification
  • Metrics reference
  • Running evals guide
  • Comparing runs and tracking regressions
  • CI integration patterns

Tests

  • add-battery-eval.test.ts - CLI battery installation
  • eval-dataset.test.ts - Dataset parsing
  • eval-metrics.test.ts - Metrics computation
  • eval-report-diff.test.ts - Report diffing
  • eval-runner-thresholds.test.ts - Runner threshold enforcement

UI components

  • Added SystemBanner component to display experimental feature warnings in docs

Notes

  • The eval feature is marked as experimental. The core workflow is stable, but the dataset schema and report format may evolve based on real-world usage.
  • Removed the initial spec document (specs/EVAL_HARNESS_SPEC.md) as the feature is now implemented.

@pingSubhajit pingSubhajit self-assigned this Jan 9, 2026
@vercel
Copy link

vercel bot commented Jan 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
unrag-web Ready Ready Preview, Comment Jan 9, 2026 8:32pm

@pingSubhajit pingSubhajit merged commit 4a132d5 into release/v0.2.9 Jan 9, 2026
3 checks passed
@pingSubhajit pingSubhajit deleted the feat/eval-harness branch January 9, 2026 20:35
pingSubhajit added a commit that referenced this pull request Jan 9, 2026
…L processing (#22)

* fix: wire url processing within image embed through existing fetch policy (#20)
* feat: Add evaluation harness battery for retrieval quality measurement (#21)
* chore: bump package minor version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants